195 research outputs found

    The Mirror DBMS at TREC-8

    Get PDF
    The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking

    The SIKS/BiGGrid Big Data Tutorial

    Get PDF
    The School for Information and Knowledge Systems SIKS and the Dutch e-science grid BiG Grid organized a new two-day tutorial on Big Data at the University of Twente on 30 November and 1 December 2011, just preceding the Dutch-Belgian Database Day. The tutorial is on top of some exciting new developments in large-scale data processing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, Microsoft, and Facebook. The course teaches how to process terabytes of data on large clusters, and discusses several core computer science topics adapted for big data, such as new file systems (Google File System and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new 'noSQL' databases (BigTable, Cassandra and Dynamo)

    Runtime Optimizations for Prediction with Tree-Based Models

    Full text link
    Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning algorithms

    The role of evaluation in the development of content-based retrieval techniques

    Get PDF

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Better contextual suggestions in ClueWeb12 using domain knowledge inferred from the open web

    Full text link
    Proceedings of the 23rd Text Retrieval Conference (TREC 2014), held in Gaithersburg, Maryland, USA, on 2014This paper provides an overview of our participation in the Contextual Suggestion Track. The TREC 2014 Contextual Suggestion Track allowed participants to submit personalized rankings using documents either from the OpenWeb or from an archived, static Web collection, the ClueWeb12 dataset. In this paper, we focus on filtering the entire ClueWeb12 collection to exploit domain knowledge from touristic sites available in the Open Web. We show that the generated recommendations to the provided user profiles and contexts improve significantly using this inferred domain knowledge.This research was supported by the Netherlands Organization for Scientific Research (NWO project #640.005.001
    • 

    corecore